Goto

Collaborating Authors

 data-efficient hierarchical reinforcement learning


Data-Efficient Hierarchical Reinforcement Learning

Neural Information Processing Systems

Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher-and lower-level training.


Reviews: Data-Efficient Hierarchical Reinforcement Learning

Neural Information Processing Systems

Summary The authors present a heirarchical reinforcement learning approach which learns at two levels, a higher level agent that is learning to perform actions in the form of medium term goals (changes in the state variable) and a low level agent that is aiming to (and rewarded for) achieving these medium term goals by performing atomic level actions. The key contributions identified by the authors are that learning at both lower and higher level are off-policy and take advantage of recent developments in off-policy learning. The authors say that the more challenging aspect of this, is the off policy learning at the higher level, as the actions (sub-goals) chosen during early experience are not effectively met by the low level policy. Their solution is to instead replace (or augment) high level experience with synthetic high level actions (sub-goals) which would be more likely to have happened based on the current instantiation of the low level controller. An additional key feature is that the sub-goals, rather than given in terms of absolute (observed) states, are instead given in terms of relative states (deltas), and there is a mechanism to update this sub-goal appropirately as the low level controller advances.


Data-Efficient Hierarchical Reinforcement Learning -- HIRO

#artificialintelligence

Traditional reinforcement learning algorithms have achieved encouraging success in recent years. Their nature of reasoning on the atomic scale, however, makes them hard to scale to complex tasks. Hierarchical Reinforcement Learning(HRL) introduces high-level abstraction, whereby the agent is able to plan on different scales. In this post, we discuss an HRL algorithm proposed by Ofir Nachum et al. in Google Brain at NIPS 2018. The algorithm, known as HIerarchical Reinforcement learning with Off-policy correction(HIRO), is designed for goal-directed tasks, in which the agent tries to reach some goal state.